Beyond Terms: Multi-Word Units in MultiTerm Extract

نویسندگان

  • María Fernández Parra
  • Pius ten Hacken
چکیده

Multi-word units are lexical units that are written as more than one word. They constitute a rather heterogeneous class, whose only unifying feature is that they represent a mismatch between orthographic representation and lexical units. Included in this class are syntactically governed combinations (e.g. correspond with), complex prepositions (e.g. in spite of), collocations (e.g. put into practice), idioms (e.g. have a bee in one's bonnet), etc.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

Preparatory Work on Automatic Extraction of Bilingual Multi-Word Units from Parallel Corpora

Automatic extraction of bilingual Multi-Word Units is an important subject of research in the automatic bilingual corpus alignment field. There are many cases of single source words corresponding to target multi-word units. This paper presents an algorithm for the automatic alignment of single source words and target multi-word units from a sentence-aligned parallel spoken language corpus. On t...

متن کامل

Extracting Chinese Multi-Word Units from Large-Scale Balanced Corpus

Automatic Multi-word Units Extraction is an important issue in Natural Language Processing. This paper has proposed a new statistical method based on a large-scale balanced corpus to extract multi-word units. We have used two improved traditional parameters: mutual information and log-likelihood ratio, and have increased the precision for the top 10,000 words extracted through the method to 80....

متن کامل

Identifying Fixed Expressions: A Comparison of SDL MultiTerm Extract and Déjà Vu’s Lexicon

The term fixed expression refers to a formally quite heterogeneous group of expressions, such as adjective-noun collocations (e.g. heavy smoker), prepositional expressions (e.g. in spite of), verbal expressions (e.g. break the ice), dual expressions (e.g. black and white), foreign phrases (e.g. per capita), etc. The properties that unite them are that they consist of more than one word and are ...

متن کامل

Towards Bilingual Term Extraction in Comparable Patents

In order to extract bilingual terms in a corpus of comparable patents, we present a novel framework in this paper. The framework includes the following major steps: 1) extract monolingual single-word and multi-word term candidates in monolingual patents; 2) Find parallel sentences in comparable patents; 3) extract bilingual single-word and multi-word term candidates; 4) identify correct bilingu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011